Using Error-Correcting Codes for Text Classification
نویسنده
چکیده
This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text classifiers. We show that the accuracy of a Naive Bayes Classifier over text classification tasks can be significantly improved by taking advantage of the error-correcting properties of the code. We also explore the use of different kinds of codes, namely Error-Correcting Codes, Random Codes, and Domain and Data-specific codes and give experimental results for each of them. The ECOC method scales well to large data sets with a large number of classes. Experiments on a real-world data set show a reduction in classification error by up to 66% over the traditional Naive Bayes Classifier. We also compare our empirical results to semitheoretical results and find that the two closely agree.
منابع مشابه
Multi-class Classification with Error Correcting Codes
Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classification of Internet pages for a search engine database. The traditional 1-of-n output coding for classification scheme needs resources increasing linearly with the number of classes. A different solution uses an error correcting code, increasing in length with O(log2(n)) only. I...
متن کاملMulti-class Text Categorization with Error Correcting Codes
Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classi cation of Internet pages for a search engine database. The traditional 1-of-n output coding for classi cation scheme needs resources increasing linearly with the number of classes. A di erent solution uses an error correcting code, increasing in length with O(log2(n)) only. In t...
متن کاملOne-point Goppa Codes on Some Genus 3 Curves with Applications in Quantum Error-Correcting Codes
We investigate one-point algebraic geometric codes CL(D, G) associated to maximal curves recently characterized by Tafazolian and Torres given by the affine equation yl = f(x), where f(x) is a separable polynomial of degree r relatively prime to l. We mainly focus on the curve y4 = x3 +x and Picard curves given by the equations y3 = x4-x and y3 = x4 -1. As a result, we obtain exact value of min...
متن کاملKDD Project Report Using Error-Correcting Codes for Efficient Text Classification with a Large Number of Categories
We investigate the use of Error-Correcting Output Codes (ECOC) for efficient text classification with a large number of categories and propose several extensions which improve the performance of ECOC. ECOC has been shown to perform well for classification tasks, including text classification, but it still remains an under-explored area in ensemble learning algorithms. We explore the use of erro...
متن کاملClassification of EEG-based motor imagery BCI by using ECOC
AbstractAccuracy in identifying the subjects’ intentions for moving their different limbs from EEG signals is regarded as an important factor in the studies related to BCI. In fact, the complexity of motor-imagination and low amount of signal-to-noise ratio for EEG signal makes this identification as a difficult task. In order to overcome these complexities, many techniques such as variou...
متن کامل